PUB-NO: EP001065607A2 



DOCUMENT-IDENTIFIER: EP 1065607 A2 



TITLE: 



System and method of predicting a user's rating for an 
item in a collaborative filtering system 



PUBN-DATE: 



January 3, 2001 



INVENTOR-INFORMATION: 
NAME 

GLANCE, NATALIE S 
DARDENNE, MANFRED 



COUNTRY 
FR 



ASSIGNEE-INFORMATION: 
NAME 

XEROX CORP 



COUNTRY 
US 



APPL-NO: EP00305375 
APPL-DATE: June 26, 2000 



PRIORITY-DATA: US34286299A ( June 29, 1999) 
INT-CL (IPC): G06F017/30, G06F017/60 
EUR-CL (EPC): G06Q030/00 



ABSTRACT: 

CHG DATE=20010202 STATUS=0>&ORDF;&ORDF;&ORDF;&ORDF;A system and method of 
predicting a user's rating of a new item in a collaborative filtering system in 
which an initial set of correlation coefficients for the intended users is used 
to bootstrap the system is described. The users are members of a predetermined 
organization and the initial correlation coefficient for each pair of users is 
based on the organizational relationship between the users. Prior 
organizational relationship information pertaining to the strength of ties, 
such as a formal organization chart and social network maps built using 
interviews or deduced from observed (online and/or off line) interaction 
patterns between potential users, is used to bootstrap the filtering system. 
Correlation coefficients can be updated as users rate or rerate items in the 
system. <IMAGE> 
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(54) System and method of predicting a user's rating for an item in a collaborative filtering 
system 



(57) A system and method of predicting a user's rat- 
ing of a new item in a collaborative filtering system in 
which an initial set of correlation coefficients for the 
intended users is used to bootstrap the system is 
described. The users are members of a predetermined 
organization and the initial correlation coefficient for 
each pair of users is based on the organizational rela- 
tionship between the users. Prior organizational rela- 



tionship information pertaining to the strength of ties, 
such as a formal organization chart and social network 
maps built using interviews or deduced from observed 
(online and/or off line) interaction patterns between 
potential users, is used to bootstrap the filtering system. 
Correlation coefficients can be updated as users rate or 
rerate items in the system. 
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Description 

[0001] This invention relates generally to automatic collaborative filtering systems for predicting a user's level of 
interest in new information, and more particularly to a system and method of bootstrapping or cold starting a collabora- 
tive filtering system. 

[0002] The amount of information that is available globally, via the World Wide Web or the Internet, or locally on 
some Intranets, is so large that managing such information is critical. One way of managing and distributing information 
is to use a collaborative filtering system to predict a user's preference and use that information to distribute new infor- 
mation to the user. 

[0003] Collaborative filtering, the sharing of knowledge through recommendations, is an important vehicle for dis- 
tributing information. There are two distinct types of collaborative filtering mechanisms: those which enable active col- 
laborative filtering by making it easier for people to share pointers to interesting documents and those which automate 
collaborative filtering by using statistical algorithms to make recommendations based on correlations between personal 
preferences. 

[0004] Collaborative filtering systems are of particular value to suppliers of goods and services in that they can be 
used to enhance the distribution of their goods and services to customers. Automated collaborative filtering (ACF) is a 
general type of statistical algorithm that matches items (such as movies, books, music, news articles, etc.) to users by 
first matching users to each other. ACF uses statistical algorithms to make recommendations based on correlations 
between personal preferences. Recommendations usually consist of numerical ratings input manually by users, but can 
also be deduced from user behavior (e.g., time spent reading a document, actions such as printing, saving or deleting 
a document). The premise of such systems is that a user is going to prefer an item that is similar to other items chosen 
by the user and by other users. 

[0005] US-A-5724567 and US-A-570401 7 illustrate examples of the prior art. 

[0006] However, automated collaborative filtering systems such as the above suffer from the cold-start problem: 
early users will receive inaccurate predictions until there is enough usage data for the algorithm to be able to learn their 
preferences. In prospective applications of ACF technology, such as knowledge management tools for organizations, 
consistent high quality service is key. Many existing current systems which employ ACF (MovieLens, Amazon.com, Bar- 
nesandNoble, etc.) either require users to rate a number of items before they will provide recommendations, use data 
from purchases, or provide initial predictions which are not personalized (e.g., use the average rating). 
[0007] Knowledge Pump is a Xerox system which employs a push methodology of sharing knowledge where users 
are connected by a central knowledge repository with software tracking their interests and building up information that 
is sent to appropriate users. In Knowledge Pump the system is seeded with a skeletal social network of the intended 
users, a map of the organization's domains of interest and a collection of recommended items. For example, user-pro- 
vided lists of immediate contacts or friends - people whose opinion the user tends to particularly value - may be used. 
See Glance et a!., "Knowledge Pump: Supporting the Flow and Use of Knowledge," in Information Technology for 
Knowledge Management, Eds. U. Borghoff and R. Pareschi, New York: Sp ringer- Verlag, pp. 35-45, 1998. 
[0008] Note that even when ACF is feasible, it does not necessarily yield accurate predictions. The accuracy of the 
prediction depends on the number of items rated in common by pairs of users X and Y, the number of ratings available 
for the item and the number of other items each rater of that item has rated. 

[0009] In many systems such cold-starting techniques are not always acceptable to users. Few users want to take 
the time to provide initial ratings and thus may lose interest in using the system. In some systems using 'average data" 
may not be useful in providing recommendations. Other systems, especially new systems, may have no related data 
from which to extrapolate a user's interests or no means of acquiring the seed data. 

[0010] There is a need for a system and a method of bootstrapping an ACF system that provides accurate esti- 
mates beginning with initial operation of the system. There is also a need for a system and method of bootstrapping an 
ACF system that is easily updated as users continue to use the system and method. There is a need for a system and 
method of bootstrapping an ACF system that is particularly useful for Intranets. 

[0011] A method of predicting a user's rating of a new item in a collaborative filtering system, according to the inven- 
tion, includes providing an initial set of correlation coefficients for the intended users. The users are members of a pre- 
determined organization and the correlation coefficient for each pair of users is based on the organizational relationship 
between the users. Once the system is seeded with a set of correlation coefficients for the intended users, when a new 
item is presented, the system calculates a prediction for item. If other users in the system have rated the item, a pre- 
dicted user rating is calculated. The predicted user rating calculation is the weighted average of all ratings for the item, 
using the correlation coefficients. 

[0012] In an organizational setting, there are many kinds of prior organizational relationship information available 
concerning the population of users. One such predetermined organizational relationship includes the strength of ties 
between potential users. Examples of organizational data include a formal organization chart and social network maps 
built using interviews or deduced from observed (online and/or offline) interaction patterns. Such data is generally read- 
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ily available in an Intranet setting, and may also be inferred for an Internet setting. 

[0013] A collaborative filtering system for predicting a user's rating for an item, according to the invention includes 
a memory and a processor. The memory stores a correlation coefficient for each user in the system or the data neces- 
sary for calculating the correlation coefficients. The correlation coefficient is a measure of the similarity in ratings 
between pairs of users in the system who have rated a particular item. The memory also stores ratings for the item 
made by other users in the system. The processor calculates the weighted average of all the ratings for the item, 
wherein the weighted average is the sum of the product of a rating and its respective correlation coefficient divided by 
the sum of the correlation coefficients to provide a predicted user rating. The users are members of a predetermined 
organization and the initial value of the correlation coefficient for each pair of users in the system comprises a prede- 
termined organizational relationship among the users. 

[0014] Once the collaborative filtering system is up and running, the initial values for the correlation coefficients can 
be updated as users provide ratings to items. To provide further accuracy in the correlation coefficients, and thus in the 
resulting prediction and recommendation, the correlation coefficients can be updated when a user changes his/her rat- 
ing for a particular item. This is accomplished by backtracking, i.e., removing the prior rating and replacing it with the 
new rating, then recalculating the affected correlation coefficients. 

[0015] Preferably, ratings are provided in the form of enumerated values (such as 0,1 ,2,3,4,5). This guarantees that 
correlations are always defined (no division by zero). Also, preferably, predictions are calculated about a threshold value 
or constant (such as the midpoint or average of the enumerated values. 2.5). 

[0016] Some examples of methods and systems according to the invention will now be described with reference to 
the accompanying drawings, in which: 

Figure 1 is a block diagram of a collaborative filtering system; 

Figure 2 is a flow chart of a method of predicting a user's rating of a new item; 

Figure 3 is a flow chart of an overall method of updating the correlation coefficients; 

Figure 4 is a flow chart of a method of updating the correlation coefficients to take into account a new user rating; 
Figure 5 is a flow chart of a method of backtracking the correlation coefficients after a user has re rated a previously 
rated item; and 

Figure 6 is flow chart of another overall method of updating the correlation coefficients. 

[0017] Referring now to the drawings, and in particular to Figure 1 , a collaborative filtering system according to the 
invention is generally shown and indicated by reference numeral 10. ACF system 10 includes processor 12 and mem- 
ory 18. Correlation coefficients and ratings S? y ) provided by the various users and others are stored in memory 
18. When the system 10 first starts up, only initial correlation coefficients (0) need be stored. Subsequent values of 
oixy can be stored or calculated in real-time from ratings (Sf x , S? y ) and the initial values (0). However, it should be 
noted that for incremental update, the correlation values need to be stored. Initially, the ct^O) for cold-starting the ACF 
system are provided in a manner described below. Initial ratings (Sf x , sf y ) may also be provided. The processor 1 2 per- 
forms various calculations described below: user prediction, correlation coefficient update, and user-applied ratings. 
[0018] A group of users 20 may request and receive recommendations from, and provide ratings to, ACF system 
1 0 via interface 20. Interface 24 may be via an intranet or the Internet. 

Prediction Method 

[0019] User to user correlations are an essential part of an ACF system. A standard approach to ACF is to use the 
Pearson r Algorithm to calculate user to user correlations (correlation coefficients) from ratings data that has been pro- 
vided by the users. These correlation coefficients are then used to predict a user's score for an item as a correlation- 
weighted average of all the ratings for the 'item. The standard Pearson r correlation coefficient used to measure the sim- 
ilarity between users from items they have both rated is given by the relationship: 

m _ z { s^r x){ s-s- y )^ (1) 

Jt(S x -S~ x ) 2 x jL(S y -S~ y ) 2 ' 

where the sum is over all N items that users X and Y have rated in common, S x is X's rating for item S and S y is Y's 
rating for item S. The coefficient r ranges from -1 , which indicates a negative correlation, to 0, which indicates no cor- 
relation, to 1 which indicates a positive correlation between both users. Many systems employing ACF tend to use a 
variation of this coefficient called the Constrained Pearson r Algorithm, which takes into account variations of ratings 
about a fixed point (usually the midpoint of the range). For example, if ratings range from 0 to 5, the constrained Pearson 
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r coefficient would take into account variations around 2.5. Thus the relationship for determining the correlation coeffi- 
cient otjfy, between user X and user Y about a fixed point of P 0 is given by: 

„ _ ns x -p 0 )(s y -P o) (2) 



Jz(S x -P 0 ) 2 xjL(S y -P 0 y 



[0020] For a group of users, sometimes referred to as a community of users, such as in an Intranet setting, a set of 
values { otjfj, } would be obtained, with separate correlation coefficients for each pair of users (X,Y). 
10 [0021] To predict a user X's score for an hern /', ACF calculates a weighted average of all the ratings on the partic- 
ular item provided by other users Y in the system according to the following formula for the prediction, P* x : 



15 



(3) tinPo+S. 



IN 



20 

[0022] Preferably only enumerated values are used for the ratings (0,1 ,2,3,4,5) to ensure that the correlations are 
always defined (no division by zero). 

[0023] Once a prediction for item / is obtained, the prediction can be used by a recommender system to make a 
recommendation to the user X. For example, if P t x is less than the average or midpoint of the ratings, P 0 , the recom- 
25 mender system would likely not recommend item / to user X. If the F* x is greater than P 0 , the recommender system 
would likely recommend the item / to user X. The recommender system may also use the predictions to rank the rec- 
ommended items shown to the user. 

[0024] Typically this prediction relationship (3) is modified to take into account only correlations above a certain 
threshold. In addition, in very large systems (especially those involving a community of users on the Internet) in order 
30 to reduce the computational complexity of this relationship, the sum is taken over a neighborhood of the N most highly 
correlated raters. 

Correlation Coefficients 

35 [0025] When a recommender system employing an ACF system is first set up, there are no ratings data and thus, 
no way to calculate user to user correlations. In order to calculate the correlation between two users, X and Y, they must 
have rated at least one item in common. Early on in the system's use and operation, it is very likely that when user 2 
rates an item /, it will not be possible to predict X's interest in that item, since X and Z have not rated anything in com- 
mon. As more users in the system rate the item / and as X rates more items, the likelihood that X has rated some item 

40 in common with some user who has rated the item / increases. 

[0026] Thus the ACF system must be bootstrapped with a set of user to user correlation coefficients and a set of 
prior user ratings. There are several preferred methods of estimating the initial values of user to user correlations, 
«xy(0). and they may be obtained according to one or a combination of the methods described below. 
[0027] The first method is to use a formal organizational chart among the intended users and scale the correlation 

45 by the number of steps, r?, required to reach a common ancestor: 

aLjy (0) = cc£, where 0 < ct 0 < 1. (4) 

For a pair of users, X and Y, with a common ancestor (e.g., manager), n - 0. For a pair of users, where one user is an 
so ancestor for the other (e.g., manager and managee), n is the number of levels separating the two. In terms of the pre- 
diction relationship, Equation (3), this scaling behavior implies that (at least initially) the average opinion of a cluster of 
size 
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has the same weight as the opinion of a single person who is one step closer in the organization. Thus, someone in the 
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same workgroup is hypothesized to have the same influence as several people in another workgroup. 
[0028] A second method is to provide a map of the competencies in an organization. This map can be used alone 
or in combination with the correlation relationship (4) to complement the organizational structure. The approach to col- 
laborative filtering described herein, in fact, is community-centered. That is, the prediction process described above is 

5 iterated over a set of communities (or domains of interest). The bootstrapped correlation values can be different for 
each community. While the organizational chart will stay the same, the set of experts (higher ranked persons in the 
organization) will depend on the community. By default, all non-expert users would be given a higher default correlation 
with the experts, with the amount of correlation proportional to the expert's perceived level of competence. An advan- 
tage of taking into account experts is that they cut across the formal organizational structure. 

jo [0029] Another method of bootstrapping correlation values is to ask users who their advisors are, that is, whose 
opinions they particularly trust and respect. These advisors could then be given an initially very high correlation with the 
user. Conversely, users could be asked for those people whose opinion they do not trust (a kind of "kill" file or contrary 
indicator). These "advisors" would be assigned very low default correlations. More generally, the system could include 
a method of allowing users to specify default correlations between themselves and anyone else in the organization. This 

75 same method could be used to dynamically "re-sef or tweak correlation values that the system has learned for the user. 
[0030] The above methods can be easily combined, so that, for example, the initial correlation values are obtained 
first from the organizational chart and expert yellow pages and then modified according to user input. This combination 
can be thought of as an extended organization chart with informal links. 

[0031] Obtaining organizational information in an Intranet setting, as noted above, should be easier than in a com- 
20 munity of users across the Internet. In an Internet setting (and in an Intranet setting), data mining techniques may be 
employed to infer social relationships from co-citations (WWW pages, papers, etc.) and from browsing patterns (WWW, 
digital libraries). The strength of social ties inferred using these techniques could then be used to form the initial values 
for the correlation coefficients. When generating correlation coefficients from such techniques, care should be taken not 
to create too much of an initial bias towards individuals who are better represented electronically. To the extent the use 
25 of information technology may not be widespread enough in some organizations, data mining techniques may have to 
be supplemented in order to build a complete enough picture of the social network. 

[0032] Referring to Fig. 2, a flow chart of the steps in making a recommendation using an ACF system is shown. In 
step 102, a set of initial correlation coefficients a^(0) is provided using one of the methods described above. In step 
1 04, a set of prior user ratings (or estimates) is also provided. Preferably, the variance a 0 for each user is also provided 
30 as part of the prior ratings. In step 106 a new item /', which has been rated by at least some of the other users in the 
community, is provided. In step 1 08 a prediction of how user X will rate item / is made. In step 1 1 0, if the rating is greater 
than a threshold value P 0 , then the recommender system makes a recommendation to user X in step 112. If not, the 
recommender system does not make a recommendation in step 1 14. 

35 Updating the Correlation Coefficients 

[0033] As users continue to use the system, the user to user correlation values 0^(0) must be updated to incorpo- 
rate the ratings made by users for the same items in the system. A method for updating the correlation values using 
data about ratings for documents rated by X and Y as such data becomes available is also described below. To update 

40 the correlation value, the update relationship, with a parameterized weight accorded to the initial value, is obtained. 
[0034] In addition to being a way to learn the correlation between X and Y over time, the update formula is a much 
more efficient way to calculate the correlation coefficient. Every time there is a new rating by X on an item that has also 
been rated by Y, the new correlation coefficient is calculated from the previous value using the update relationship. 
Referring to Figure 3, a flow chart of the overall method of updating the correlation coefficients is shown. For each 

45 item / in step 1 20, a check is made in step 122 to determine if user X has provided a rating for item /. If not, the routine 
ends at step 124. If the user X does rate item /, the routine checks if X has previously rated the item / at step 126. If yes, 
the system backtracks the cc^ for all affected in step 128 (i.e., the system replaces the old rating for / with the new 
rating). Then in step 130 the routine updates the for all affected users Y who have rated item / in common with user 
X. 

so [0035] The update formula is obtained by expanding the correlation coefficient relationship (Equation 2) as a func- 
tion of the previous correlation coefficient, the new rating pair, the variance of X's distribution of ratings on items in com- 
mon with Y and the variance of Y's distribution of ratings in common with X (not including the new rating). The update 
relationship for the correlation coefficient is: 
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„ {T , ^_ NG x( T xy)^y(T xy )a xy (T xy ) + (S x - P Q )(S y - P Q ) 

"xy^xy + V- {N + l)c x (T xy + 1)0^(7^+1) (5) 
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where 



(6) aa^J^fiS-JH.^ 

■JN + I 



Vtf +i 

are the individual update formulas for X's and Y's ratings distributions, respectively over items rated in common. In 
equations (5) and (6), N is shorthand for a function of X, Y and T ^ : N^T^) - N 0 + T xy . N 0 \s a parameter of the 
system which reflects the weight attributed to the prior estimate of the user-user correlation. In this embodiment with 
enumerated user choices (0,1 ,2,3,4,5 and P 0 = 2.5), we use a value for N 0 of between 5 and 1 0. Each time there is a 
new item which both X and Y have rated, is incremented by one, starting from a value of 0 initially. 
[0036] The update equation, Equation (5), is seeded with prior estimates of user to user correlations a ^ ( T - 0) . 
Examples of preferred estimates for seeding the update equation (5) are described above. The update equation, Equa- 
tion (5) is also seeded with a prior estimate of the standard deviation in user's ratings, which is taken to be independent 
of the user: a X {T = 0) = o y (7" = 0)= o 0 for all users X and Y The value for o 0 could be estimated using ratings data 
sets taken from other systems. It is expected that the initial value chosen for o 0 to be about 20% of the range of the 
rating scale (i.e., somewhere between 0.5 to 1.5 for a six-point rating scale that goes from 0 to 5 as in the example 
above). 

[0037] Referring to Figure 4, when a user provides a new rating for an item, the update routine 1 30 for updating a xy 
is used. In step 132 the current value for is obtained. Then the standard deviations o x and o y are updated in step 
134 using Equations (6). Then 0^(7^+1) is calculated in step 136 using Equation (5). At step 138 the routine checks 
if the a xy have been updated for all users Y who have rated an item in common with user X. Once all the applicable a xy 
have been updated, the routine ends at step 140. 

[0038] To ensure an even more accurate update of the prediction relationship, the case in which either X or Y 
revises their rating must be considered. In this case, in order to calculate the current correlation coefficient between the 
two users X and Y, a backtrack formula must be used. The backtrack formula removes the effect of the previous rating 
pair and then updates the correlation coefficient in order to take into account the revised rating. The backtrack formula 
is: 

„ (T ^ _ ^x( 7 "xy)q y (7'xy) a xy( 7 "xyHgx- P o)(gy- P o) 
xy< xy-l>- (*-1)G x (7^1)G y (V1) 



Equation (7) depends upon using previously calculated values of the standard deviations in X's and Y's ratings distribu- 
tions, which are given by: 



(8) 



°* Tv - l) 



In this case, is not incremented, and N is shorthand for N 0 + T^. Once the backtrack has been applied, then the 
update algorithms of Eqs. (5) and (6) are applied to obtain the revised correlation coefficient. 
[0039] Referring to Figure 5, when a user provides a new rating for an item which he/she has previously rated, the 
backtrack routine 150 for backtracking is used. In step 152 the current value for is obtained. Then the standard 
deviations o x and c y are backtracked in step 154 using Equations (8). Then 0^(7^-1) is calculated in step 156 using 
Equation (7). At step 158 the routine checks if the cc™ have been backtracked for all users Y who have rated an item in 
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common with user X. Once all the applicable have been backtracked, the routine ends at step 160. 
[0040] In order to use the update relationship for the collaborative filtering system with the bootstrapped values, the 
following intermediate values must be saved (in a database in memory 1 8, for example) for each pair of users X and Y: 
7j<y> °xp °> an d a xy- The backtrack relationship requires saving all ratings by all users. Note that o x and o y are depend- 
5 ent on the pair X, Y since they take into account only documents rated by both X and Y. If, instead, the standard devia- 
tion over all items rated by the user (which would allow saving it once for all users instead of once for each pair) is 
calculated, the backtrack calculation for the correlation (Equation 7) could not be performed. 

[0041] Storing the intermediate values entails a fair amount of memory, which size scales at worst with M 2 , where 
M is the number of users. On the other hand, the correlation calculation decreases in complexity from M 2 x D, where 
w D is the number of documents or items, to a simple update calculation independent of M and D. Thus, the update algo- 
rithm has the additional benefit of potentially allowing predictions to be updated dynamically, instead of off-line as is 
more typically done. 

[0042] Referring to Figure 6, a flowchart of the procedure for updating correlation values and propagating the effect 
to the calculation of predictions is shown. For each new rating by user X of an item / the routine at step 1 80 is called. In 

is step 1 82, all users Y who have also rated item / are determined. In step 1 84, increment and save for all such users 
X and Y. In step 1 86, update and save o\, o y and cc^ , indexed by X and Y using Equations (5) and (6). In step 1 88, for 
all items j * /, which Y has rated, but X has not, predict (and save) X's rating on j using Equation (3). In step 1 90, for all 
items j * /, which X has rated, but Y has not, predict {and save) Y's rating on j using Equation (3). In step 1 92, find all 
users Y where Y has not rated /. In step 194, predict (and save) Y's rating on / using Equation (3). 

20 [0043] For each second or later rating by user X of an item /, step 1 84 is skipped and step 1 86 is modified to include 
a backtrack calculation (using Equations (7) and (8)) before the update calculation. Note that in place of Equation (3) 
the update relationship Equation (5) can also be employed to calculate the predictions. If used, additional intermediate 
calculations would need to be saved that entail additional storage that scales with M x D. If storage is limited, the cor- 
relations can be calculated from the ratings data only by iterating over the following revised procedures. 

25 [0044] The modified method is accomplished by iterating over the time-ordered list of ratings. For each new rating 
by user X of an item /, 

(1) Find all users Y where Y has also rated /'. 

(2) Increment T^. 

30 (3) Update o Xf o y and a^y, indexed by users X and Y. 

For subsequent ratings of user X of an item /', step (2) is skipped and step (3) is modified to include an update step. All 
predictions for all users on all unrated documents can then be calculated in one pass. 

[0045] It will be appreciated that the present invention may be readily implemented in software using software 
35 development environments that provide portable source code that can be used on a variety of hardware platforms. 
Alternatively, the disclosed system may be implemented partially or fully in hardware using standard logic circuits. 
Whether software or hardware is used to implement the system varies depending on the speed and efficiency require- 
ments of the system and also the particular function and the particular software or hardware systems and the particular 
microprocessor or microcomputer systems being utilized. 

40 

Claims 

1 . A method of predicting a user's rating for an item in a collaborative filtering system, comprising: 

45 providing a correlation coefficient for each pair of users in the system, wherein the correlation coefficient is a 

measure of the similarity in ratings between pairs of users who have rated a particular item; 
determining ratings for items rated by other users in the system; 

calculating the weighted average of all the ratings for the item, wherein the weighted average is the sum of the 
product of a rating and its respective correlation coefficient divided by the sum of the correlation coefficients to 
so provide a predicted user rating; 

wherein the plurality of users are members of a predetermined organization; and 

wherein the correlation coefficient for each user in the system comprises a predetermined organizational rela- 
tionship among the users. 

55 2. The method of claim 1 , wherein the predetermined organizational relationship comprises an organizational chart 
among the users and wherein the initial correlation coefficient between a pair of users is a function of the number 
of levels between the pair. 
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3. The method of claim 1 or claim 2, wherein the predicted user rating of an item / for a user X is calculated in accord- 
ance with the relationship: 



£(s;-p 0 )a„ 

n i J*rm*r$ 



10 



where S' y is the rating of each user Y who has rated the item /, P 0 is a predetermined value and is the correla- 
tion coefficient between the user X and the user Y. 

4. The method of claim 3, further comprising: 

15 

updating the correlation in accordance with the relationship: 

„ (T +1 , N Vx( T xy)°y(T^)a xy (T xy ) + (S x -P 0 )(S y -P 0 ) 
20 ** ' ("+1)0,(7^+1)^7^+1) 

where 



25 



aiTv + l) TnTi 



+ 1) V^" ~ 



35 are the individual update formulas for the user X's and user Y's ratings distributions, respectively over items 

rated in common, T^(0) is the weight attributed to a prior estimate of the user X to user Y correlation, and N is 
a function of the users X and Y and J^. 

5. The method of claim 3 or claim 4, wherein the initial correlation coefficient for each pair of users X and Y in the sys- 
40 tern comprises the relationship 

a xy (0)= a q, where 0 < a 0 < 1 , 
wherein n is the number of levels separating user X and user Y 

45 

6. The method of any of the preceding claims, wherein the correlation coefficient for each pair of users in the system 
further comprises user specified correlations between pairs of users in the organization. 

7. The method of any of the preceding claims, further comprising: 

50 

receiving a user rating for the item; and 

using the user rating to update the user's correlation coefficients with other users. 

8. The method of claim 7, further comprising: 

55 

for each updated rating of an item rated in common by X and Y, backtracking to remove the effect of the prior 
rating. 
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9. The method of claim 8, wherein the prior rating is removed in accordance with the relationship: 



(A/-1)a x (r xy .1)a y (r xy -1) 



where 




are the individual update formulas for user X's and user Y's rating distributions, respectively over items rated in 
common, is the weight attributed to the prior estimate of the user to user correlation, and N is a function of the 
users and J xy . 

10. A collaborative filtering system for predicting a user's rating for an item, comprising: 
a memory storing: 



a correlation coefficient for each pair of users in the system, wherein the correlation coefficient is a meas- 
ure of the similarity in ratings between pairs of users in the system who have rated at least one item in com- 
mon; and 

ratings for the item made by other users in the system; 

a processor for calculating the weighted average of all the ratings for the item, wherein the weighted aver- 
age is the sum of the product of a rating and its respective correlation coefficient divided by the sum of the 
correlation coefficients to provide a predicted user rating; 
wherein the users are members of a predetermined organization; and 

wherein the correlation coefficient for each user in the system comprises a predetermined organizational 
relationship among the users. 



11 . A system according to claim 1 0, adapted to carry out a method according to any of claims 1 to 9. 
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FIG. 3 
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